Keegan Smith

11/07/23

Comp Arch & Design

1. Consider the following instruction:

AND $t1, $t2, $t3

* 1. What are the values of control signals generated by the control in Figure 4.2 for the above instruction?
     + - ALU AND operation
       - RegWrite = 1
       - MemWrite = 0
       - MemRead = 0
       - ALUsrc = 0
       - RegDst = 1
       - Branch = 0
  2. Which resources (pipeline units) are used for performing this instruction?

The ALU is performing the AND operation, and the register file is used to read from the values of $t2 and $t3.

* 1. Which resources produce outputs, but their outputs are not used for this instruction? Which resources produce no outputs for this instruction?

The other functions for the ALU and their respective control lines are not being used.

The register file provides the values of $t2 and $t3, but the outputs are not used for this instruction since the ALU uses the values form the source registers.

The Data memory is not being used.

The program counter is not being used.

1. The basic single-cycle MIPS implementation in Figure 4.2 above can only implement some instructions. New instructions can be added to an existing ISA, but the decision depends on the cost and complexity the proposed addition introduces into the processor datapath and control. The following three questions refer to the new instruction:

LWI Rt, Rd (Rs) ## Reg[Rt] = Mem[ Reg[Rd] + Reg[Rs] ]

* 1. Which existing blocks (if any) can be used for this instruction?

The block that can be used is the instruction memory, to get the instruction from memory.

* 1. Which new functional blocks (if any) do we need for this instruction?

The new function block that needs to be used for this instruction is the ALU, to perform the address calculation.

* 1. What new control signals do we need (if any) to support this instruction?

The control unit needs to send signals to the ALU to enable it, as well as for what operation to perform.

1. When processor designers consider a possible improvement to the processor datapath, the decision usually depends on the cost/performance trade-off. In the following three problems, assume that we are starting with a datapath from Figure 4.2, where I-Mem, Add, Mux, ALU, Regs, D-Mem, and Control blocks have latencies of 400 ps, 100 ps, 30 ps, 120 ps, 250 ps, 350 ps, and 100 ps, respectively, and costs of 1000, 30, 10, 100, 200, 2000, and 500, respectively.

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| INSTRUCTION | I-Mem | Add | Mux | ALU | Regs | D-Mem | Control blocks |
| LATENCIES | 400 | 100 | 30 | 120 | 250 | 350 | 100 |
| COST | 1000 | 30 | 10 | 100 | 200 | 2000 | 500 |

Consider the addition of a multiplier to the ALU. This addition ***will add 300 ps to the latency of the ALU and will add a cost of 600 to the ALU****.* ***The result will be 5% fewer instructions executed*** since we will no longer need to emulate the MUL instruction.

* 1. What is the clock cycle time with and without this improvement?

Clock cycle time = max latency + increase of ALU with multiplier

= 400 + 300

Clock Cycle Time = 700 ps

* 1. What is the speedup achieved by adding this improvement?

Speedup = 1 / (1 – Instruction reduction factor)

= 1 / (1 – 0.05)

Speedup = 1.053

* 1. Compare the cost/performance ratio with and without this improvement.

Total Cost = Sum of cost

= 1000 + 30 +10 + 100 + 200 + 2000 + 500 + 600

= 4840

Number of instructions executed = 100% - reduction factor = 95%

So, cost/performance ratio is : 4840 / 0.95 = 5094.74

1. Problems in this exercise assume that logic blocks needed to implement a processor’s datapath have the following latencies:

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| INSTRUCTION | I-Mem | Add | Mux | ALU | Regs | D-Mem | Sign-extend | Shift-left-2 |
| LATENCY | 220 | 70 | 20 | 90 | 90 | 250 | 15 | 10 |

* 1. If the only thing we need to do in a processor is fetch consecutive instructions (Figure 4.6 in the textbook), what would the cycle time be?

The cycle time would be 220ps since the I-Mem instruction is in the path.

* 1. Consider a datapath similar to the one in Figure 4.11, but for a processor that only has one type of instruction: unconditional PC-relative branch. What would the cycle time be for this datapath?

PC-relative branch requires: I-Mem -> Sign-ext -> shift-left-2 -> Add -> mux

Cycle time = 220 + 15 + 10 + 70 + 20

Cycle time = 335 ps